Pesquisa | Portal Regional da BVS

Convolutions are competitive with transformers for protein sequence pretraining.

Yang, Kevin K; Fusi, Nicolo; Lu, Alex X.

Cell Syst ; 15(3): 286-294.e2, 2024 Mar 20.

Artigo em Inglês | MEDLINE | ID: mdl-38428432

RESUMO

Pretrained protein sequence language models have been shown to improve the performance of many prediction tasks and are now routinely integrated into bioinformatics tools. However, these models largely rely on the transformer architecture, which scales quadratically with sequence length in both run-time and memory. Therefore, state-of-the-art models have limitations on sequence length. To address this limitation, we investigated whether convolutional neural network (CNN) architectures, which scale linearly with sequence length, could be as effective as transformers in protein language models. With masked language model pretraining, CNNs are competitive with, and occasionally superior to, transformers across downstream applications while maintaining strong performance on sequences longer than those allowed in the current state-of-the-art transformer models. Our work suggests that computational efficiency can be improved without sacrificing performance, simply by using a CNN architecture instead of a transformer, and emphasizes the importance of disentangling pretraining task and model architecture. A record of this paper's transparent peer review process is included in the supplemental information.

Assuntos

Biologia Computacional , Redes Neurais de Computação , Sequência de Aminoácidos , Revisão por Pares

Protein structure generation via folding diffusion.

Wu, Kevin E; Yang, Kevin K; van den Berg, Rianne; Alamdari, Sarah; Zou, James Y; Lu, Alex X; Amini, Ava P.

Nat Commun ; 15(1): 1059, 2024 Feb 05.

Artigo em Inglês | MEDLINE | ID: mdl-38316764

RESUMO

The ability to computationally generate novel yet physically foldable protein structures could lead to new biological discoveries and new treatments targeting yet incurable diseases. Despite recent advances in protein structure prediction, directly generating diverse, novel protein structures from neural networks remains difficult. In this work, we present a diffusion-based generative model that generates protein backbone structures via a procedure inspired by the natural folding process. We describe a protein backbone structure as a sequence of angles capturing the relative orientation of the constituent backbone atoms, and generate structures by denoising from a random, unfolded state towards a stable folded structure. Not only does this mirror how proteins natively twist into energetically favorable conformations, the inherent shift and rotational invariance of this representation crucially alleviates the need for more complex equivariant networks. We train a denoising diffusion probabilistic model with a simple transformer backbone and demonstrate that our resulting model unconditionally generates highly realistic protein structures with complexity and structural patterns akin to those of naturally-occurring proteins. As a useful resource, we release an open-source codebase and trained models for protein structure diffusion.

Assuntos

Dobramento de Proteína , Proteínas , Proteínas/metabolismo , Redes Neurais de Computação , Conformação Proteica

Discovering molecular features of intrinsically disordered regions by using evolution for contrastive learning.

Lu, Alex X; Lu, Amy X; Pritisanac, Iva; Zarin, Taraneh; Forman-Kay, Julie D; Moses, Alan M.

PLoS Comput Biol ; 18(6): e1010238, 2022 06.

Artigo em Inglês | MEDLINE | ID: mdl-35767567

RESUMO

A major challenge to the characterization of intrinsically disordered regions (IDRs), which are widespread in the proteome, but relatively poorly understood, is the identification of molecular features that mediate functions of these regions, such as short motifs, amino acid repeats and physicochemical properties. Here, we introduce a proteome-scale feature discovery approach for IDRs. Our approach, which we call "reverse homology", exploits the principle that important functional features are conserved over evolution. We use this as a contrastive learning signal for deep learning: given a set of homologous IDRs, the neural network has to correctly choose a held-out homolog from another set of IDRs sampled randomly from the proteome. We pair reverse homology with a simple architecture and standard interpretation techniques, and show that the network learns conserved features of IDRs that can be interpreted as motifs, repeats, or bulk features like charge or amino acid propensities. We also show that our model can be used to produce visualizations of what residues and regions are most important to IDR function, generating hypotheses for uncharacterized IDRs. Our results suggest that feature discovery using unsupervised neural networks is a promising avenue to gain systematic insight into poorly understood protein sequences.

Assuntos

Proteínas Intrinsicamente Desordenadas , Proteoma , Sequência de Aminoácidos , Evolução Molecular , Proteínas Intrinsicamente Desordenadas/química , Conformação Proteica , Proteoma/metabolismo

Learning unsupervised feature representations for single cell microscopy images with paired cell inpainting.

Lu, Alex X; Kraus, Oren Z; Cooper, Sam; Moses, Alan M.

PLoS Comput Biol ; 15(9): e1007348, 2019 09.

Artigo em Inglês | MEDLINE | ID: mdl-31479439

RESUMO

Cellular microscopy images contain rich insights about biology. To extract this information, researchers use features, or measurements of the patterns of interest in the images. Here, we introduce a convolutional neural network (CNN) to automatically design features for fluorescence microscopy. We use a self-supervised method to learn feature representations of single cells in microscopy images without labelled training data. We train CNNs on a simple task that leverages the inherent structure of microscopy images and controls for variation in cell morphology and imaging: given one cell from an image, the CNN is asked to predict the fluorescence pattern in a second different cell from the same image. We show that our method learns high-quality features that describe protein expression patterns in single cells both yeast and human microscopy datasets. Moreover, we demonstrate that our features are useful for exploratory biological analysis, by capturing high-resolution cellular components in a proteome-wide cluster analysis of human proteins, and by quantifying multi-localized proteins and single-cell variability. We believe paired cell inpainting is a generalizable method to obtain feature representations of single cells in multichannel microscopy images.

Assuntos

Microscopia/métodos , Análise de Célula Única/métodos , Aprendizado de Máquina não Supervisionado , Células Cultivadas , Biologia Computacional , Humanos , Processamento de Imagem Assistida por Computador/métodos , Redes Neurais de Computação , Leveduras/citologia

YeastSpotter: accurate and parameter-free web segmentation for microscopy images of yeast cells.

Lu, Alex X; Zarin, Taraneh; Hsu, Ian S; Moses, Alan M.

Bioinformatics ; 35(21): 4525-4527, 2019 11 01.

Artigo em Inglês | MEDLINE | ID: mdl-31095270

RESUMO

SUMMARY: We introduce YeastSpotter, a web application for the segmentation of yeast microscopy images into single cells. YeastSpotter is user-friendly and generalizable, reducing the computational expertise required for this critical preprocessing step in many image analysis pipelines. AVAILABILITY AND IMPLEMENTATION: YeastSpotter is available at http://yeastspotter.csb.utoronto.ca/. Code is available at https://github.com/alexxijielu/yeast_segmentation. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Microscopia , Software , Contagem de Células , Saccharomyces cerevisiae

Integrating images from multiple microscopy screens reveals diverse patterns of change in the subcellular localization of proteins.

Lu, Alex X; Chong, Yolanda T; Hsu, Ian Shen; Strome, Bob; Handfield, Louis-Francois; Kraus, Oren; Andrews, Brenda J; Moses, Alan M.

Elife ; 72018 04 05.

Artigo em Inglês | MEDLINE | ID: mdl-29620521

RESUMO

The evaluation of protein localization changes on a systematic level is a powerful tool for understanding how cells respond to environmental, chemical, or genetic perturbations. To date, work in understanding these proteomic responses through high-throughput imaging has catalogued localization changes independently for each perturbation. To distinguish changes that are targeted responses to the specific perturbation or more generalized programs, we developed a scalable approach to visualize the localization behavior of proteins across multiple experiments as a quantitative pattern. By applying this approach to 24 experimental screens consisting of nearly 400,000 images, we differentiated specific responses from more generalized ones, discovered nuance in the localization behavior of stress-responsive proteins, and formed hypotheses by clustering proteins that have similar patterns. Previous approaches aim to capture all localization changes for a single screen as accurately as possible, whereas our work aims to integrate large amounts of imaging data to find unexpected new cell biology.

Assuntos

Processamento de Imagem Assistida por Computador/métodos , Microscopia de Fluorescência/métodos , Proteoma/metabolismo , Proteínas de Saccharomyces cerevisiae/metabolismo , Saccharomyces cerevisiae/metabolismo , Frações Subcelulares/metabolismo , Biologia Computacional/métodos , Ontologia Genética , Ensaios de Triagem em Larga Escala , Humanos , Transporte Proteico , Proteoma/análise , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/crescimento & desenvolvimento , Proteínas de Saccharomyces cerevisiae/genética

Extracting and Integrating Protein Localization Changes from Multiple Image Screens of Yeast Cells.

Lu, Alex X; Handfield, Louis-Francois; Moses, Alan M.

Bio Protoc ; 8(18): e3022, 2018 Sep 20.

Artigo em Inglês | MEDLINE | ID: mdl-34395810

RESUMO

The evaluation of protein localization changes in cells under diverse chemical and genetic perturbations is now possible due to the increasing quantity of screens that systematically image thousands of proteins in an organism. Integrating information from different screens provides valuable contextual information about the protein function. For example, proteins that change localization in response to many different stressful environmental perturbations may have different roles than those that only change in response to a few. We developed, to our knowledge, the first protocol that permits the quantitative comparison and clustering of protein localization changes across multiple screens. Our analysis allows for the exploratory analysis of proteins according to their pattern of localization changes across many different perturbations, potentially discovering new roles by association.

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA